Search CORE

22 research outputs found

Optimal web-scale tiering as a flow problem

Author: Leung Gilbert
Quadrianto Novi
Smola Alexander
Tsioutsiouliklis Kostas
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2010
Field of study

We present a fast online solver for large scale parametric max-flow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. Our algorithm solves an integer linear program in an online fashion. It exploits total unimodularity of the constraint matrix and a Lagrangian relaxation to solve the problem as a convex online game. The algorithm generates approximate solutions of max-flow problems by performing stochastic gradient descent on a set of flows. We apply the algorithm to optimize tier arrangement of over 84 million web pages on a layered set of caches to serve an incoming query stream optimally

CiteSeerX

Sussex Research Online

Cut Tree Algorithms

Author: Andrew V. Goldberg
Kostas Tsioutsiouliklis
Publication venue
Publication date
Field of study

This is an experimental study of algorithms for the cut tree problem. We study the Gomory-Hu and Gusfield's algorithms as well as heuristics aimed to make the former algorithm faster. We develop an efficient implementation of the Gomory-Hu algorithm. We also develop problem families for testing cut tree algorithms. In our tests, the Gomory-Hu algorithm with a right combination of heuristics was significantly more robust than Gusfield's algorithm

CiteSeerX

Clustering Methods Based on Minimum-Cut Trees

Author: Gary Flake
Kostas Tsioutsiouliklis
Robert Tarjan
Publication venue
Publication date
Field of study

In this paper we introduce a simple clustering method for undirected graphs. The clustering method uses maximum ow techniques on the link-structure of the graph. The quality of the produced clusters is bounded by strong minimum-cut and expansion criteria. We also present a framework for hierarchical clustering and apply it to real-world data. We conclude that the clustering algorithms satisfy strong theoretical criteria and perform well in practice

CiteSeerX

"8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality

Author: Biyani Prakhar
Blackmer John
Tsioutsiouliklis Kostas
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 21/02/2016
Field of study

Clickbaits are articles with misleading titles, exaggerating the content on the landing page. Their goal is to entice users to click on the title in order to monetize the landing page. The content on the landing page is usually of low quality. Their presence in user homepage stream of news aggregator sites (e.g., Yahoo news, Google news) may adversely impact user experience. Hence, it is important to identify and demote or block them on homepages. In this paper, we present a machine-learning model to detect clickbaits. We use a variety of features and show that the degree of informality of a webpage (as measured by different metrics) is a strong indicator of it being a clickbait. We conduct extensive experiments to evaluate our approach and analyze properties of clickbait and non-clickbait articles. Our model achieves high performance (74.9% F-1 score) in predicting clickbaits

Association for the Advancement of Artificial Intelligence: AAAI Publications

Linguistic Redundancy in Twitter

Author: Fabio Massimo Zanzotto
Kostas Tsioutsiouliklis
Marco Pennacchiotti
Publication venue
Publication date: 01/07/2011
Field of study

In the last few years, the interest of the research community in micro-blogs and social media services, such as Twitter, is growing exponentially. Yet, so far not much attention has been paid on a key characteristic of microblogs: the high level of information redundancy. The aim of this paper is to systematically approach this problem by providing an operational definition of redundancy. We cast redundancy in the framework of Textual Entailment Recognition. We also provide quantitative evidence on the pervasiveness of redundancy in Twitter, and describe a dataset of redundancy-annotated tweets. Finally, we present a general purpose system for identifying redundant tweets. An extensive quantitative evaluation shows that our system successfully solves the redundancy detection task, improving over baseline systems with statistical significance.

CiteSeerX

ART

Rule-based Word Clustering for Document Metadata Extraction

Author: Eren Manavoglu et al.
Hui Han
Kostas Tsioutsiouliklis
Publication venue
Publication date: 01/01/2005
Field of study

Text classification is still an important problem for unlabeled text

CiteSeerX